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ABSTRACT 

Wc present a method to infer the redshift distribution of an arbitrary dataset based on spa- 
tial cross-correlation with a reference population and we apply it to various datasets across the 
electromagnetic spectrum. Our approach advocates the use of clustering measurements on all 
available scales, in constrast to previous works focusing only on linear scales. We apply this 
technique to infer the redshift distributions of luminous red galaxies and emission line galaxies from 
the SDSS, infrared sources from WISE and radio sources from FIRST. We show that consistent 
redshift distributions arc found using both quasars and absorber systems as reference populations. 
This technique promises to be widely applicable to existing and upcoming sky surveys. It pro- 
vides us with the invaluable ability to deproject the inherently 2d observations of the extragalactic sky. 



Subject headings: redshift - clustering 

1. INTRODUCTION 

Observations of the sky are inherently a two- 
dimensional measurement of photon flux density as a 
function of angular position. For astrophysical studies 
one usually needs to infer three-dimensional positions, 
for example to convert a brightness into a luminosity. 
This has been a long-standing limitation in astronomy. 

On extragalactic scales, distances are usually inferred 
from redshift measurements using the knowledge of the 
expansion history of the Universe. A redshift can be di- 
rectly measured from observations when one can detect 
and identify a high-contrast spectroscopic feature. Con- 
sequently, robust redshift measurements require spectro- 
scopic observations of sources with emission or absorp- 
tion lines at a sufficient spectral resolution. Such obser- 
vations are usually expensive and restricted to bright ob- 
jects; for example, the Sloan Digital Sky Survey (SDSS; 
Abazajian et al. 2009) has imaged ^100 million galaxies, 
but only of order 1% have been foUowed-up spectroscop- 
ically, most of which are bright and nearby. 

For the vast majority of galaxies, distance estimates 
rely on so-called "photometric" redshifts. They use 
observed broadband colors to probe the overall spectral 
energy distribution (SED) of a source. Thus, they rely of 
qualitatively different information. Photometric redshift 
estimation suffers from a number of limitations: intrinsic 
degeneracies between colors and redshifts, arbitrary 
SED templates, dust reddening, etc. Despite these 
limitations, however, all upcoming imaging surveys rely 
on photometric redshifts. With deeper surveys of the 
sky and access to new wavelength ranges from space, 
the lack of robust distance estimates is becoming a 
limitation. Moreover, given the speed that telescopes 
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are mapping out the sky, the fraction of objects for which 
wc have spectra decreases with time. Consequently, 
alternative techniques should be explored to estimate 
cosmological redshifts. 

In this paper, we show that the distance estimates from 
the tiny fraction of sources with spectroscopic or accu- 
rate photometric redshifts can be propagated statistically 
to other objects across different surveys using informa- 
tion extracted from spatial clustering. The main idea 
is that the large-scale cross-correlations between objects 
with existing distance measurements and an unknown 
astronomical sample can provide us with an estimate of 
the unknown sample's redshift distribution. 

While this direction has been explored previously by 
several authors (Newman 2008; Benjamin ct al. 2010; 
Matthews & Newman 2010; Schulz 2010; Matthews & 
Newman 2012; McQuinn & White 2013), in this pa- 
per we show that this idea can be generalized by includ- 
ing clustering information from all scales, leading to a 
much higher sensitivity and a wider applicability. Wc 
demonstrate the power of this technique by estimating 
the redshift distributions of existing datasets, without 
any knowledge of the source properties. We apply our 
tool to various datasets across the electromagnetic spec- 
trum, from the optical to the radio range (where pho- 
tometric redshifts cannot even be defined) and estimate 
the corresponding redshift distributions. A companion 
paper (Schmidt et al. 2013) presents results from numer- 
ical simulations to test the robustness and limits of our 
redshift inference method when applied to realistic dis- 
tributions of dark matter halos and galaxies. 

2. ESTIMATING REDSHIFTS 

2.1. The covariance of the sky 

Electromagnetic observations of the sky consist of a 
measurement of flux density F\ as a function of angu- 
lar position. We denote the flux density fluctuation at 
a location and wavelength A as 5F\{(f)). The generic 
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covariance of the extragalactic sky is given by 

Cobs(Ai,A2,0) = {5F^^i(f>) SFxJ(f> + 0))^ . (1) 

This quantity is uniquely defined and provides us with 
the statistical information on the extragalactic sky as 
a function of position and wavelength. Given that the 
large-scale structure of the Universe approaches that of 
a Gaussian random field, this quantity captures a signif- 
icant fraction of the structure of the extragalactic sky. 

If we have access to a population of objects whose spa- 
tial distribution S{r) is located within a narrow redshift 
bin centered on i.e. in the limit of dJV/ d.z — y Sj)(^z' — z^^ 
we can use it to probe a projection of the flux density 
fluctuation SFx: 

CoUX,rp,z) = {5{z)5Fx{rp)) . (2) 

If this correlation is signiflcantly greater than random, 
it implies that the fleld SF contains sources at redshift 
z (we first ignore the small modulation possibly in- 
duced by gravitational magnification and discuss it in 
section 2.4). Here we note that the flux fluctuation SFx 
can be a continuous fleld (e.g. CMB temperature) or 
a discrete one (e.g. galaxies). This indicates that it is 
possible to extract statistical information on the redshift 
distribution of an arbitrary dataset from measurements 
of this cross-correlation as a function of redshift. To flrst 
order, the correlation in Equation 2 can be used directly 
to test for the absence of objects in SFx at redshift z. 

When using observed spectra to calibrate photometric 
redshifts, one makes use of the correlation given in Eq. 2, 
the correlation between a known redshift and the observ- 
able SFx (or similarly a color) , but ignores the spatial de- 
pendence. An important point of this paper is that the 
cinvironment (or projected environment) of a source can 
be treated as an observable which, in a statistical context, 
is a powerful indicator of its properties, including its red- 
shift. Due to overlapping objects along the linc^-of-sight. 
the projected environment is often a noise-dominated 
quantity. However, if one is interested in estimating the 
redshift of an ensemble of objects, the mean projected en- 
vironment can become a signal-dominated quantity and a 
useful source of information. We now show how to make 
use of this information to infer a redshift distribution 
for the ensemble of objects. A few authors have explored 
this avenue (Newman 2008; Matthews & Newman 2010; 
Schulz 2010; Matthews & Newman 2012; McQiunn & 
White 2013, e.g. ) and considered approaching the prob- 
lem from a theoretical and global point of view. Here we 
take a different approach, taking into account its appli- 
cability to real-world datasets, emphasizing the interest 
of using small scale clustering information. 

2.2. Redshift inference from spatial clustering 

Let us consider two populations of extragalactic ob- 
jects: a reference population for which we know the an- 
gular positions and redshifts of each object and an un- 
known population for which angular positions arc known 
but redshifts are not. Let Ui be the surface density of 
objects and Sui be the corresponding density fluctuation 
Si = ni/{ni) — 1, where (n,) is the mean density. The 
index i refers to the reference or unknown population. 



We introduce the normalized redshift distribution; 

"^^^^^^d^/y ^^^"d^ 

where (i^ Ni / dzdil is the number of objects per unit red- 
shift and solid angle. The clustering of matter induces 
correlations between the positions of overdensities. The 
mean density of unknown sources at a given separation 
from reference objects is given by 

{nu{e, Z))r = {riu) [1 + Wru{0, z)] (4) 

where Wru{&,z) is the angular cross-correlation function 
between all unknown objects and the reference popula- 
tion at redshift z (Peebles 1993) . This is the basic source 
of information we will use to infer redshifts. 

Previoiis studies making use of spatial correlations to 
infc;r redshift distributions focused only on large scales, 
on which the galaxy bias is sufficiently linear. For 
example, Newman (2008) proposed a method to recover 
the redshift distribution of unknown populations, using 
measurements of the cross-correlation function on scales 
greater than a few Mpc. Here, instead we propose 
to measure the amount of clustering by integrating 
over all available scales. This approach allows us to 
significantly increase the signal-to- noise ratio (S/N) of 
the basic observable. Prom this quantity alone, valuable 
information can be extracted; It can, for example, be 
used to infer the existence or absence of sources at 
a given redshift. As we will show below, it can also 
be used to probe the local properties of an unknown 
redshift distribution. While somewhat less accurate 
than the method proposed by Newman (2008) and 
recently optmized by McQuinn & White (2013), our 
proposed method is significantly more sensitive and can 
be applied to numerous datasets. From a practical point 
of view we also note that analyses of real-world data 
(for example in the optical and the infrared regimes) 
tend to be much less affected by systematics on small 
scales. Working on scales smaller than a degree (or 
a few projected Mpc) can be quite valuable. In this 
paper, we advocate for this approach and demonstrate 
that cluster-based redshifts provide us with a powerful 
exploratory tool. 

As a measure of clustering we will consider the inte- 
grated cross-correlation function 

Wur{z)= / deW{6)Wur{e,z) (5) 

where W(0) is a weight function, whose integral is nor- 
malized to unity, aimed at optimizing the overall S/N. 

As the matter correlation function can often be approxi- 
mated by a power law over a broad range of scale with 7 
or order unity, we can simply use W{9) oc 9^^. We note 
that for 7 = 1 there is an equal amount of clustering 
information per logarithmic scale. This suggests that a 
significant amount of information can be extracted from 
small scale measurements. 

The integrated angular cross-correlation between the 
reference sample and the unknown sample can be written 
as 

Wur= I Az' 4>u{z')(j)r{z')hu{z')hr{z')w{z') (6) 
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where w is the integrated dark matter correlation func- 
tion (as defined in Eq. 5) and bu{z) and br{z) are the cor- 
responding integrated biases, defined as the square root 
of the ratios between the galaxy and dark matter cor- 
relation functions. We now consider the case for which 
(priz) 5r){z' — z). In practice approaching this limit 
requires selecting a reference population within a redshift 
slice centered on z with a narrow width Sz. If one wishes 
to detect a non-zero cross-correlation at that redshift, 
this imposes a lower limit on the density of reference ob- 
jects in the sky. When focusing on small angular scales, 
where measurements are limited by shot noise, it is given 

by 

'^>1^^. (7) 
d^; 5z 

As an example, if we consider clustering measurements 
on scales of 1 degree, this translates into 

' 100 deg-2 
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> 10^ 



0.1 



(8) 



As we will show in section 2.3, various datasets satisfy 
this criterion. We note that the detectability condition 
given in Eq. 7 is not required for redshift inferrence. The 
method can be used to generate a large number of noise- 
dominated cross-correlation estimates which can then be 
analyzed statistically. In the limit of a narrow reference 
redshift slice, the redshift distribution of the unknown 
sample is 



(j}u{z) = X 



1 



bu{z) br{z)w{z) 



(9) 



In this equation, the only unknown quantity is bu{z). 
The integrated dark matter correlation function w{z) can 
be estimated and the redshift-dependent bias of the ref- 
erence sample can be obtained from a measurement of 
its auto-correlation function: 



Wrriz) = bl{z) w{z) 



(10) 



While this relation is valid only on scales where galax- 
ies are linearly biased with respect to the dark matter 
field, the inclusion of smaller scales provides only a mod- 
est departure from it. We demonstrate this point in our 
companion paper (Schmidt et al. 2013) using numerical 
simulations. One reason for this is that the scale de- 
pendence of the bias is usually a slowly varying quantity. 
Moreover, our estimate is based on an average over a wide 
range of scales which weakens the non- linear effects. 

It is important to realize that degree of variation of 
each term in equation 9 is expected to differ. We denote 
Az as the range over which (/'^(z) is greater than zero. If 
over this range the relative variation of 4>uiz) dominates 
over that of bu{z), or in other words if 



dlogt 



> 



dlog6^ 

dz " dz 
then, over the interval Az, we have 



buiz) oc Wur{z) 



1 



(11) 



(12) 



,br{z) w(-2). 

The more limited the interval Az, the larger the variation 
of <j)u{z) and the smaller the variation of the other quan- 
tities. The proportionality constant depends on the value 



of the integrated bias of the unknown sample. However, 
if we know that all objects of this sample contribute to 
the spatial cross-correlations with the reference sample 
then we can simply normalize the redshift distribution 
using 



/ 



dz dfi <j)u {z) = Nu 



(13) 



This relation is typically satisfied if all the objects of the 
unknown sample are extragalactic and if the redshift dis- 
tribution of the reference population is wide enough to 
cover the redshift range of all unknown objects. This im- 
plies that, in the case of a narrow redshift distribution, 
it is possible to infer it without the knowledge of the bias 
of the unknown population. Below, we discuss observa- 
tional strategies leading to reducing the redshift support 
Az. 

If the redshift evolution of the bias 6„(z) over Az 
is not negligible compared to the variation of <^u(z), 
this method only allows us to estimate the product 

4'u{z)bu{z). Additional information from the auto- 
correlation of the unknown sample can be used to at- 
tempt breaking the degeneracy between these two quan- 
tities (see for example Newman 2008). In the present 
study we propose to focus on a local sampling of the red- 
shift distribution of an unknown population, as opposed 
to the methods proposed to infer the global distribution. 

In the general case, Eq. 12 provides us with an esti- 
mator testing for the absence or existence of sources at 
a given redshift z, i.e. a data-driven approach to red- 
shift estimation which can be applied to any continu- 
ous or discrete dataset. When probing sources for which 
spectral energy distribution templates are not available 
(for example because the physics of the objects is not 
understood) or for which no spectroscopic data is avail- 
able, the cluster-based redshift estimation proposed in 
this paper provides us with a robust way to infer the pres- 
ence/absence of sources as a function of redshift, without 
any assumption. 

2.3. Data analysis strategy 

The method presented in the previous section is better 
suited for probing the redshifts of an unknown popu- 
lation for which objects exist only within some limited 
redshift interval Az. In practice, sky surveys often pro- 
vide us with a series of obscrvables for each source (e.g. 
brightness, colors, size, shape, etc.). In this case, the best 
approach to the characterization of the redshift distribu- 
tion of a given dataset is to first select subsamples in the 
space of all observable parameters. Each subsample will, 
by construction, live in a redshift interval narrower than 
that of the entire population. The more parameters are 
available, the more likely it is to identify regions of that 
space mapping to narrow redshift intervals. The ability 
of selecting subsamples in the corresponding multidimen- 
sional parameter space is important. 

In addition, we note that the higher the redshift sam- 
pling given by the reference population, the more likely 
we are to detect high contrast features in the redshift 
distribution of the unknown sample. The width 5z of 
the reference redshift slices should be as small as pos- 
sible. Eq. 7 shows that datasets with more than thou- 
sands of objects per unit redshift are required to detect 
a typical cross-correlation signal. Interestingly we now 
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Fig. 1. — Compilation of samples from the SDSS for which we 
have a robust 3d position, either from spectroscopic or photometric 
redshifts. In this paper we make use of the spectroscopic samples 
of quasars and Mg II absorbers as shown with the dark blue and 
brown curves. 

have access to a variety of surveys providing us with 3d 
positions (based on spectroscopic redshifts or, in some 
cases, sufficiently accurate photometric redshifts) which 
are large enough. As an illustration, we show in figure 1 
a compilation of samples drawn from the Sloan Digital 
Sky Survey (SDSS; Abazajian et al. 2009) for which the 
redshift distributions are known. The figure includes dis- 
tributions for galaxies, quasars and absorber systems. As 
can be seen, the usability criterion given in Eq. 7 is met 
by numerous samples. This figure also also shows that 
different populations can be used to check the consistency 
of the infered redshift distributions. 

In the next section we will make use of the spectro- 
scopic quasar and absorber samples as reference popula- 
tions. Those are shown with the dark blue and brown 
curves, respectively. While SDSS quasars are found all 
roughly all redshifts from to 6, Mg II absorbers are only 
visible in the range 0.4 < z < 2.2. 

2.4. Gravitational lensing effects 

The apparent spatial density of sources in the sky is 
modulated by gravitational magnification effects due to 
the matter distribution along the line-of-sight (Narayan 
1989, e.g.). This induces an apparent correlation between 
populations of objects lying at different redshifts. The 
amplitude of this effect, also called cosmic magnification, 
has been estimated by several authors (see Bartelmann, 
M. & Schneider, P. 2001) and detected by the large-scale 
distribution of galaxies by Scranton et al. (2005) and 
Menard (2010). For sources at high redshift lensed by 
typical galaxies at z ~ 0.5, the amplitude of the magnifi- 
cation effect is about 1% on a scale of one arcminute. In 
general, this is negligible compared to the signal induced 
by physical clustering of overlaping samples. In addi- 
tion, the redshift dependence of the lensing efficiency 



varies slowly with redshift. The absence of such a sig- 
nature in the redshift distribution infered by the spatial 
cross-correlation technique directly indicates that cosmic 
magnification effects are not playing a significant role. 

3. APPLICATION TO DATA 

We now apply our method to estimate the redshift 
distribution of several populations: (i) Luminous Red 
Galaxies (LRGs) for which accurate photometric red- 
shifts are available for comparison, (ii) Emission Line 
Galaxies (ELGs) for which photometric redshift estima- 
tion is more difficult to estimate due to the presence of 
strong emission lines, (iii) infrared sources from WISE 
survey and (iv) radio sources from the FIRST survey, 
for which photometric redshifts for the single radio flux 
density are difficult to define. In the first two cases we 
will use both spectroscopic quasars and Mg II absorbers 
as reference samples, specifically the SDSS DR7 quasar 
catalog (Schneider et al. 2010) and the DR7 Mgll catalog 
compiled by Zhu & Menard (2012). These two samples 
have different bias evolution profiles, so comparing recov- 
eries on the same unknown sample is a good test that our 
technique is insensitive to the reference sample's bias. 

We measure spatial cross-correlations between each 
'unknown' sample and the two spectroscopic populations, 
integrating over physical scales ranging from zero to 1 
Mpc, using a simple weight function W{r) cx l/r. Our 
estimator for the redshift distribution 0(z) is simply nor- 
malized according to Eq. 13. Our goal here is not to con- 
struct an optimal estimator but to demonstrate that this 
technique provides us with a new type of information on 
redshift distributions, independent of what is obtained 
through photometric redshifts. 

When the infered redshift distribution is broad, we 
need to take into account the redshift dependence of the 
bias of the reference population. For these recoveries, 
we use only our quasar sample, taking our bias evolution 
from Porciani & Norberg (2006): 
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with 7 = 4 to provide a better fit to the high-redshift 
quasar clustering measurements (Shen et al. 2012). 

3.1. Luminous Red Galaxies 

We now apply our technique to the MegaZ-LRG sam- 
ple (Collister et al. 2007). This catalogue contains 
about one million SDSS Luminous Red Galaxies with 
robust photometric redshifts. This sample spans the 
redshift range 0.4 < z < 0.7 with limiting magnitude 
i < 20. The 2dF-SDSS LRG and Quasar (2SLAQ; Can- 
non et al. 2006) spectroscopic redshift catalogue of 13 
000 intermediate-redshift LRGs provides a photometric 
redshift training set, indicating that the rms photometric 
redshift accuracy obtained for an evaluation set selected 
from the 2SLAQ sample is az = 0.049 averaged over 
all galaxies. The distribution of photometric redshifts is 
shown in Figure 2 with the solid line. 

We measure the spatial cross-correlation between 
LRGs and quasars as a function of redshift, and use it to 
estimate the LRG redshift distribution. The results are 
shown with the black data points. They deomnostrate 
that the overall shape of the LRG redshift distribution 
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Fig. 2. — Redshift distributions of Luminous Red Galaxies (LRGs). In both panels the solid red line shows the distribution of LRG 
photometric redshifts. Left: cluster-z distribution (black points) obtained by measuring the spatial cross-correlation between LRGs and 
SDSS quasars. Right: cluster-z distribution (black points) obtained by measuring the spatial cross-correlation between LRGs and Mg II 
absorbers, spanning the range 0.4 < z < 2. 
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Fig. 3. — Redshift distributions of Emission Line Galaxies (ELGs) from the SDSS. Left: cluster-z distribution (black points) obtained by 
measuring the spatial cross-correlation with SDSS quasars. Right: cluster-z distribution (black points) obtained by measuring the spatial 
cross-correlation with Mg II absorbers, spanning the range 0.4 < z < 2. 
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Fig. 4. — Left: Redshift distributions of three subsamples of WISE sources obtained by measuring their spatial cross-correlation with 
SDSS quasars. We show the selection criteria for red (Sample 1), blue (Sample 2) and green (Sample 3) samples in Eq. 16. Right: Redshift 
distribution of FIRST radio sources obtained by measuring their spatial cross-correlation with SDSS quasars. We observe the existence of 
sources up to 2 ~ 3 as well as a bimodal redshift distribution. 



is properly recovered. In addition, the results show that 
the megaZ-LRG sample is not significantly contaminated 
by galaxies at other redshifts in the range probed by the 
quasars. 

We then repeat our measurement replacing the quasars 
with Mg II absorbers. The results, as shown in the right 
panel of Figure 2, are again in good agreement with the 
photometric redshift distribution. This provides us with 
an estimate independent from that obtained with the 
quasars and shows that different reference samples can 
be used to obtain consistent results. 

3.2. Emission Line Galaxies 

We now apply our redshift estimation technique to the 
so-called Emission Line Galaxies (ELGs) from the SDSS 
(Comparat et al. 2013). This corresponds to a sample of 
faint blue galaxies for which the broad band colors are 
dominated by emission lines. Following these authors, we 
have selected the galaxies from the SDSS DR7 database 
with: 

i<2l.b (15) 
5 - r < 1.0 

r-i>-0.917(g-r) +0.683 
r - z>0.5 (g - r) -f 0.4 

Using SDSS DR7, this provides us with a sample of 
about 2.6 million galaxies. We measure the spatial cross- 
correlation between these ELGs and quasars as a func- 
tion of redshift and use it to estimate the redshift dis- 
tribution of the population. The results are shown in 
Figure 3 with the black data points. They indicate that 
the ELG redshift distribution is bi-modal, with a main 
population located at z ~ 0.6 and a second group located 



at lower redshift. 

We also measure the spatial cross-correlation between 
ELGs and Mg II absorbers as a function of redshift. 
Again, the recovered redshift distribution is in good 
agreement with that obtained from the spectroscopic 
quasars. In this case, the overall normalization given by 
Eq. 13 does not properly apply as the spectroscopic red- 
shift coverage is not wide enough to probe the redshifts 
of all unknown sources. As a result, the amplitude of 
4)u{z ~ 0.7) obtained with the Mg II absorber systems 
is higher than that the more correct one obtained with 
quasars as the reference population. 

Because the redshift distibution is not simple and we 
are most likely observing two distinction populations of 
galaxies with different biases, we cannot make any strong 
claims about the relative numbers of the low and high 
redshift populations. However, with the redshift recovery 
technique at our disposal, we are not limited to accept- 
ing these results as final. By iterating between selection 
cuts and recoveries, we could, for instance, tune the se- 
lection criteria in Equation 15 to exclude the low redshift 
population. 

3.3. The WISE infrared survey 

The Wide-Field Infrared Survey Explorer (WISE; 
Wright et al. 2010) is a mid-infrared survey satellite 
which provides us with all-sky observations in four bands, 
centered at 3.4, 4.6, 12, and 22 ^m (Wl to W4, here- 
after). In order to maintain homogeneous WISE sam- 
ple, we first select WISE sources with magnitude cut 
[Wl] < 16.5. 

For illustration purposes we work with three subsam- 
ples selected in color-color space, {[W2] — [W3]) — {[Wi] — 
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[W2]): 



Sample 1 
Sample 2 
Sample 3 



2 < [VK2_3] < 2.5 
0.9 < [Wi-2\ < 1.2 

2.5 < [W2-z\ < 3 
1.5 < [1^1-2] < 1.8 

3.5 < [14^2-3] < 4 
1.2 < [W^-2\ < 1-5 



where [Wi_j] = \Wi\ — \Wj\. Wc then cross-correlate 
these subsamples against the SDSS QSOs. As a result, 
we find the clear trend with redshift. In Figure 4, we 
present the redshift distributions of these three subsam- 
ples obtained by cross-correlations with QSOs as our ref- 
erence sample. In Figure 4, the three samples are shown 
with different colors: Sample 1 (red), Sample 2 (blue) 
and Sample 3 (green). While these samples represent 
only a small fraction of the WISE data, they show that 
even simple color cuts may be sufficient for selecting non- 
overlapping samples for cosmological tests (e.g. Icnsing). 
A future paper will explore the redshift distribution of 
the WISE data in more detail. 

3.4. The FIRST radio survey 

The Faint Images of the Radio Sky at Twenty cm sur- 
vey (FIRST; Becker et al. 1995) uses the Very Large Ar- 
ray (VLA) to produce a map of the 20 cm (1.4 GHz) sky 
with a beam size of 5. 4" and an rms sensitivity of about 
0.15 mJy/beam. The survey covers an area of about 
10,000 deg^ in the north Galactic cap and a smaller area 
along the celestial equator, both of which roughly coin- 
cide with the regions observed by SDSS. With a source 
surface density of ~ 90 deg~^, the final catalog includes 
about one million objects. 

Using our spectrocopic quasar catalog and correcting 
for bias evolution as given in Equation 14, we recov- 
ered the redshift distribution shown in Figure 4. As 
mentioned in §2.2, this is a broad redshift distribution 
where we are violating our conditions from Equation 11. 
Hence, wc do not expect that oxvc recovery is indepen- 
dent from evolving bias in the FIRST sample. However 
our results allow us to say with some confidence that the 
source redshift distribution extends to z ^ 3 and that 
there exists two distinct populations of sources, one cen- 
tered around z ^ 1 and a higher redshift cohort around 
z ~ 2.5. Selecting these two populations independently 
is difficiilt from radio data only, given the lack of addi- 
tional parameters available in FIRST, but can be done 
via cross-matching FIRST sources with external datasets 
(Schmidt et al., in preparation). 

4. CONCLUSIONS 

We have presented a method to infer the redshift dis- 
tribution of arbitrary datasets, based on spatial cross- 
correlations with a reference population and we have ap- 
plied it to various datasets across the electromagnetic 
spectrum. Previous works exploring the same avenue 



(e.g. Newman 2008; Matthews & Newman 2010; Schulz 
2010; Matthews & Newman 2012; McQuinn & White 
2013) have focused on large scales where the halo bias is 
linear. Here we advocate the use of clustering measure- 
ments on all available scales and discuss the benefits of 
using small-scale correlations which tend to be less af- 
fected by systematics with real data. In a companion 
paper (Schmidt et al. 2013) we have used numerical sim- 
(16) ulations to show the robustness and limitations of this 
approach. We have also used this technique to search for 
contamination of high redshift Lyman-break galaxies by 
low redshift interlopers (Morrison et al. 2012). 

Here, we have applied our method to estimate the red- 
shift distributions of SDSS luminous red galaxies, emis- 
sion line galaxies, sources from the WISE infrared survey 
and the FIRST radio survey. For the first two samples, 
located at low redshift, we have estimated their redshift 
distributions using both quasars and absorber systems as 
the reference population and obtained consistent resiilts. 
The simple, narrow redshift distributions recovered for 
the LRGs and WISE sub-samples should be reliable high 
S/N estimates of the underlying redshift distibutions for 
these samples. For the broader, multi-peaked distribu- 
tions recovered for the ELG and FIRST samples, wc do 
not expect our recoveries to be unbiased estimates of the 
distributions, but wc are able to make reliable claims 
about the redshifts of the sub-populations contained in 
these samples. Additionally, an iterative approach com- 
bining sample selection and redshift recovery has the po- 
tential to greatly aid in increasing the purity of these 
samples, separating low and high redshift populations. 

This technique promises to be widely applicable to 
existing and upcoming sky surveys. It provides us with 
the invaluable ability to deproject the inherently 2d 
observations of the extragalactic sky. 
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